{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 代码全为自己写的，学习资料参考的是B站：[Python爬虫实战教程：批量爬取某网站图片](https://www.bilibili.com/video/BV1qJ411S7F6?from=search&seid=17716982267478077598&spm_id_from=333.337.0.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 导入所用到的库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import re\n",
    "import time\n",
    "import os\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 获取网页源代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res=requests.get('https://www.photophoto.cn/tupian/kuzitupian.html')#一个图片素材网址，淘宝的图片不太好识别，都是模特穿着的\n",
    "#对网页进行访问，在访问过程中，有时会收到网页的拒绝，所以得改写请求的header。\n",
    "#如：headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "html=res.text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 利用re.findall()查找所有图片的下载网址"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['https://css.photophoto.cn/img/logo.gif', 'https://imgb15.photophoto.cn/20201220/heisekuzitupian-40226090_3.jpg', 'https://imgb15.photophoto.cn/20201221/gezikuzitupian-40226088_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40226087_3.jpg', 'https://imgb15.photophoto.cn/20201221/gezikuzitupian-40226086_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40226085_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40226083_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40226082_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40226061_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40226060_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212451_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212450_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212449_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212448_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212438_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212437_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212436_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212435_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212433_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212432_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212431_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212018_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212017_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212016_3.jpg', 'https://imgb15.photophoto.cn/20201221/kuzitupian-40212015_3.jpg', 'https://imgb15.photophoto.cn/20201220/kuzitupian-40212014_3.jpg', 'https://imgb15.photophoto.cn/20201128/misekuzitupian-39918024_3.jpg', 'https://imgb15.photophoto.cn/20201128/misekuzitupian-39918022_3.jpg', 'https://imgb15.photophoto.cn/20201119/kuzitupian-39796698_3.jpg', 'https://imgb15.photophoto.cn/20201119/kuzitupian-39796697_3.jpg', 'https://imgb15.photophoto.cn/20201112/yaoguokuzitupian-39673407_3.jpg', 'https://css.photophoto.cn/img/1.gif']\n",
      "32\n"
     ]
    }
   ],
   "source": [
    "urls=re.findall('<img src=\"(.*?)\"',html)\n",
    "print(urls)\n",
    "print(len(urls))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 新建一个下载的文件夹"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "path='clothes_img'\n",
    "if not os.path.exists(path):\n",
    "    os.mkdir(path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 下载图片到指定文件夹"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "正在下载第1个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/heisekuzitupian-40226090_3.jpg\n",
      "正在下载第2个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/gezikuzitupian-40226088_3.jpg\n",
      "正在下载第3个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40226087_3.jpg\n",
      "正在下载第4个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/gezikuzitupian-40226086_3.jpg\n",
      "正在下载第5个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40226085_3.jpg\n",
      "正在下载第6个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40226083_3.jpg\n",
      "正在下载第7个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40226082_3.jpg\n",
      "正在下载第8个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40226061_3.jpg\n",
      "正在下载第9个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40226060_3.jpg\n",
      "正在下载第10个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40212451_3.jpg\n",
      "正在下载第11个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212450_3.jpg\n",
      "正在下载第12个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40212449_3.jpg\n",
      "正在下载第13个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212448_3.jpg\n",
      "正在下载第14个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212438_3.jpg\n",
      "正在下载第15个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212437_3.jpg\n",
      "正在下载第16个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40212436_3.jpg\n",
      "正在下载第17个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212435_3.jpg\n",
      "正在下载第18个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201220/kuzitupian-40212433_3.jpg\n",
      "正在下载第19个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40212432_3.jpg\n",
      "正在下载第20个图片\n",
      "下载网址：https://imgb15.photophoto.cn/20201221/kuzitupian-40212431_3.jpg\n"
     ]
    }
   ],
   "source": [
    "#下载20张图片\n",
    "for i in range(1,21):   \n",
    "    time.sleep(1)#增加一个网络延时，防止网站访问过快二崩溃\n",
    "    file_name=urls[i].split(\"/\")[-1]\n",
    "    res=requests.get(urls[i])\n",
    "    print(\"正在下载第%d个图片\"%i)\n",
    "    print(\"下载网址：\"+urls[i])\n",
    "    with open(path+'/'+file_name,'wb') as f:\n",
    "        f.write(res.content)\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
